Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck (VIB) for speech anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA) database shows that our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems. Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
最近,人们对端到端的基于深度学习的缝线模型的关注越来越大。但是,基于深度学习的缝线中最具挑战性的点是获得成对的输入图像,这些图像具有狭窄的视野和地面真相图像,并具有从现实世界中捕获的广阔视野。为了克服这一困难,我们开发了一种弱监督的学习机制来训练缝线模型,而无需真正的地面真相图像。此外,我们提出了一个缝合模型,该模型将多个现实世界的鱼眼图像作为输入,并以等应角投影格式创建360个输出图像。特别是,我们的模型由颜色一致性校正,翘曲和混合组成,并受到感知和SSIM损失的训练。在两个实际缝合数据集上验证了所提出算法的有效性。
translated by 谷歌翻译
随着非二元人在西方社会的关注越来越多,性别对语言的策略开始摆脱二进制(仅女性/男性)性别概念。然而,到目前为止,几乎没有任何将这些身份考虑到机器翻译模型中的方法。缺乏对此类技术的社会技术意义的理解,可能会进一步再现压迫和贴标记的语言机制。在本文中,我们描述了关于性别对语言和语言技术研讨会的方法和结果,该研讨会由Tu Wien,St.P \“ Olten UAS,FH UAS,FH校园Wien和Vienna大学的十位研究人员领导和组织并于2021年秋季在维也纳举行。邀请了广泛的利益集团及其代表确保可以整体处理该主题。因此,我们的目的是包括翻译人员,机器翻译专家和非二元个人(如社区专家”)在平等的基础上。我们的分析表明,机器翻译中的性别需要高度的上下文敏感性,因此,这种技术的开发人员需要在仍在社会谈判中的过程中谨慎地定位自己,并且灵活的方法似乎最适合目前。然后,我们说明了从性别面对语言技术领域的结果遵循的步骤,以便技术发展可以充分地排列U P具有社会进步。 - [德语摘要由Arxiv Admins手动添加]
translated by 谷歌翻译
组织病理学依赖于微观组织图像的分析来诊断疾病。组织制备的关键部分正在染色,从而使染料用于使显着的组织成分更具区分。但是,实验室协议和扫描设备的差异导致相应图像的显着混淆外观变化。这种变异增加了人类错误和评估者间的变异性,并阻碍了自动或半自动方法的性能。在本文中,我们引入了一个无监督的对抗网络,以在多个数据采集域中翻译(因此使)整个幻灯片图像。我们的关键贡献是:(i)一种对抗性体系结构,该架构使用信息流分支通过单个发电机 - 歧视器网络在多个域中学习,该信息流分支优化可感知损失,以及(ii)在培训过程中包含一个附加功能提取网络,以指导指导指导的额外功能提取网络。转换网络以保持组织图像中的所有结构特征完整。我们:(i)首先证明了提出的方法对120例肾癌的H \&e幻灯片的有效性,以及(ii)显示了该方法对更一般问题的好处,例如基于灵活照明的自然图像增强功能和光源适应。
translated by 谷歌翻译
在人类生活的最初阶段,沟通被视为社会互动的过程,始终是达成当事方之间达成共识的最佳方法。在此过程中的理解和可信度对于相互协议的验证至关重要。但是,如何做到这一沟通才能达到巨大的群众?当寻求的是信息及其批准时,这是主要的挑战。在这种情况下,本研究介绍了ALT软件,该软件是由适应葡萄牙语的原始可读性指标开发的,以减少通信困难。该软件的开发是由哈贝马斯(Habermas)的沟通行动理论激励的,哈贝马斯(Habermas)使用多学科风格来衡量与公众建立和维持与公众建立和保持安全健康关系的沟通渠道中话语的可信度。 - 没有est \'agio da vida humana a comunica \ c {c} \ 〜ao,vista como um como um como um como de intera \ c {c} \ 〜ao社交,foi semper o melhor caminho para para para o consenso Entre作为partes。 o entendimento e credibilidade nesse processo s \ 〜Ao Fundamentais para para que o acordo m \'utuo seja seja valyado。 Mas,Como faz \^e-lo de forma que essa comunica \ c {c} \ 〜ao alcance a grande massa? eSse \'o principtal desafio que se busca \'e difus \ 〜ao da informa \ c {c} \ 〜ao a sua aprova \ c {c {c} \ 〜ao。 Nesse Contectiono,Este estudo apresenta o Software Alt,desenvolvido a partir de m \'eTricas de legibilidade originais aDaptadas para a l \'ingua polduguesa,dispon \'ivel'ivel na web,para reduzir,dificuldades na comunica na comunica \ comunica \ c \ c} AO。 O desenvolvimento do software foi motivado pela teoria do agir comunicativo de Habermas, que faz uso de um estilo multidisciplinar para medir a credibilidade do discurso nos canais de comunica\c{c}\~ao utilizados para construir e manter uma rela\c{c } \ 〜Ao Segura E Saud \'avel com o p \'ublico。
translated by 谷歌翻译
这本数字本书包含在物理模拟的背景下与深度学习相关的一切实际和全面的一切。尽可能多,所有主题都带有Jupyter笔记本的形式的动手代码示例,以便快速入门。除了标准的受监督学习的数据中,我们将看看物理丢失约束,更紧密耦合的学习算法,具有可微分的模拟,以及加强学习和不确定性建模。我们生活在令人兴奋的时期:这些方法具有从根本上改变计算机模拟可以实现的巨大潜力。
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implement constraint-based algorithms build on reductions to $\mathrm{SAT}$ and $\mathrm{IP}$ to verify that puzzles are strong USPs, and to search for large strong USPs. We produce tight bounds on the maximum size of a strong USP for width $k \le 5$, construct puzzles of small width that are larger than previous work, and improve the upper bounds on strong USP size for $k \le 12$. Although our work only deals with puzzles of small-constant width, the strong USPs we find imply matrix multiplication algorithms that run in $O(n^\omega)$ time with exponent $\omega \le 2.66$. While our algorithms do not beat the fastest algorithms, our work provides evidence and, perhaps, a path to finding families of strong USPs that imply matrix multiplication algorithms that are more efficient than those currently known.
translated by 谷歌翻译
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
translated by 谷歌翻译